conduitR

Lifecycle: experimental

What is conduitR?

conduitR is an R package for metaproteomics: the large-scale identification and quantification of proteins from microbial communities (e.g. gut microbiome, soil, bioreactors). It provides a single, consistent toolkit for building search databases, processing DIA-NN output, linking proteins to taxonomy and function, and running differential analysis and visualizations.

The package powers Conduit (a Snakemake workflow for metaproteomics) and Conduit-GUI (a graphical interface to explore Conduit results), but you can use conduitR on its own for custom pipelines and analyses.

Typical workflow

  1. Database building — Get proteome FASTA files from UniProt by organism or proteome ID, concatenate them, and optionally create custom FASTA from a list of UniProt accessions.
  2. Import & structure — Convert DIA-NN parquet reports into a QFeatures object (precursors → peptides → protein groups) with assay links.
  3. Annotations — Attach taxonomy, Gene Ontology, KEGG, EggNOG, or CAZy annotations from UniProt and optional conduit annotation tables.
  4. Analysis — Run limma-style differential expression, over-representation (ORA), or GSEA; train classification/regression models (e.g. random forest, XGBoost).
  5. Visualization — Volcano plots, heatmaps, PCA biplots, taxonomic heat trees, sunbursts, and KEGG pathway figures, with consistent Conduit themes and palettes.

Features

Data and databases

Data processing and structure

Statistical analysis

Visualization

Utilities

Installation

Install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("baynec2/conduitR")

Quick start

After installation, load the package and try a few entry points:

library(conduitR)

# Check that the UniProt API is reachable (required for downloads)
check_api_service()

# Validate UniProt IDs (no network needed)
validate_uniprot_accession_ids(c("P12345", "invalid_id", "A0A023GPI8"))

# Convert a DIA-NN parquet report to QFeatures (requires a local file)
# qf <- diann_to_qfeatures("path/to/report.parquet")
# plot_features_per_sample(qf, assay = "protein_groups")

# Run differential analysis (after building design/contrast)
# terms <- find_possible_contrast_terms(qf, "protein_groups", ~ group)
# res <- perform_limma_analysis(qf, "protein_groups", ~ group, "treatmentB - treatmentA")
# plot_volcano(res$top_table)

Function help and examples are in the built-in documentation: e.g. ?get_fasta_file, ?diann_to_qfeatures, ?perform_limma_analysis.

Dependencies

Core dependencies include QFeatures (proteomics data structures), limma (differential expression), SummarizedExperiment, Biostrings, httr2, KEGGREST, rentrez, tidyr, dplyr, ggplot2, plotly, metacoder, arrow, and others for specific features. See the DESCRIPTION file for the full list.

Documentation

License

MIT License; see LICENSE for details.